QC Report


general
Report generated at2023-03-11 10:39:17
Titleht22 ATAC-seq mini-encode
Descriptionht22 ATAC-seq mini-encode test run
Pipeline versionv2.2.1
Pipeline typeatac
Genomemm10
Alignerbowtie2
Sequencing endedness{'rep1': {'paired_end': True}, 'rep2': {'paired_end': True}, 'rep3': {'paired_end': True}}
Peak callermacs2

Alignment quality metrics


SAMstat (raw unfiltered BAM)

rep1rep2rep3
Total Reads301779192256710874162286922
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads294205880248809664156910174
Mapped Reads (QC-failed)000
% Mapped Reads97.596.8999999999999996.7
Paired Reads301779192256710874162286922
Paired Reads (QC-failed)000
Read115088959612835543781143461
Read1 (QC-failed)000
Read215088959612835543781143461
Read2 (QC-failed)000
Properly Paired Reads292527390243826914154325150
Properly Paired Reads (QC-failed)000
% Properly Paired Reads96.8999999999999995.095.1
With itself292569910245964602155121700
With itself (QC-failed)000
Singletons163597028450621788474
Singletons (QC-failed)000
% Singleton0.51.09999999999999991.0999999999999999
Diff. Chroms309116178351041
Diff. Chroms (QC-failed)000

Marking duplicates (filtered BAM)

rep1rep2rep3
Unpaired Reads000
Paired Reads1123896169424896859862901
Unmapped Reads000
Unpaired Duplicate Reads000
Paired Duplicate Reads38514725142868118365304
Paired Optical Duplicate Reads743516928340462407
% Duplicate Reads34.268915.158613.9741

Filtered with samtools flag 1804 (samtools view -F 1804):


Fraction of mitochondrial reads (unfiltered BAM)

rep1rep2rep3
Rn = Number of Non-mitochondrial Reads292681750247276636156369615
Rm = Number of Mitochondrial Reads420708341349321470032
Rm/(Rn+Rm) = Frac. of mitochondrial reads0.014170566664593950.0164468645293203050.009313452151853837

SAMstat (filtered/deduped BAM)

rep1rep2rep3
Total Reads147487386159268434102667862
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads147487386159268434102667862
Mapped Reads (QC-failed)000
% Mapped Reads100.0100.0100.0
Paired Reads147487386159268434102667862
Paired Reads (QC-failed)000
Read1737436937963421751333931
Read1 (QC-failed)000
Read2737436937963421751333931
Read2 (QC-failed)000
Properly Paired Reads147487386159268434102667862
Properly Paired Reads (QC-failed)000
% Properly Paired Reads100.0100.0100.0
With itself147487386159268434102667862
With itself (QC-failed)000
Singletons000
Singletons (QC-failed)000
% Singleton0.00.00.0
Diff. Chroms000
Diff. Chroms (QC-failed)000

Filtered and duplicates are removed. Subsampling with atac.subsample_reads is not done in alignment steps. Nodup BAM is converted into a BED type (TAGALIGN) later and then TAGALIGN is subsampled with such parameter in the peak-calling step.

Fragment length statistics (filtered/deduped BAM)

rep1rep2rep3
Fraction of reads in NFR0.99933958969759930.84769547496323420.9199384409576455
Fraction of reads in NFR (QC pass)TrueTrueTrue
Fraction of reads in NFR (QC reason)OKOKOK
NFR / mono-nuc reads1882.67520723436315.80349649966585311.768652859021797
NFR / mono-nuc reads (QC pass)TrueTrueTrue
NFR / mono-nuc reads (QC reason)OKOKOK
Presence of NFR peakTrueTrueTrue
Presence of Mono-Nuc peakFalseFalseFalse
Presence of Di-Nuc peakTrueFalseFalse

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays show distinct fragment length enrichments, as the cut sites are only in open chromatin and not in nucleosomes. As such, peaks representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal) fragment lengths will arise. Good libraries will show these peaks in a fragment length distribution and will show specific peak ratios.



Sequence quality metrics (filtered/deduped BAM)

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays are known to have significant GC bias. Please take this into consideration as necessary.


Annotated genomic region enrichment

rep1rep2rep3
Fraction of Reads in universal DHS regions0.360015649067100540.360516993593344440.3796856995035116
Fraction of Reads in blacklist regions0.00095866503458133020.00167844307428802880.00201200254856773
Fraction of Reads in promoter regions0.024464302323454290.027958528178910830.03228157220221455
Fraction of Reads in enhancer regions0.33359078585879880.33022544818893620.34447562568313733

Signal to noise can be assessed by considering whether reads are falling into known open regions (such as DHS regions) or not. A high fraction of reads should fall into the universal (across cell type) DHS set. A small fraction should fall into the blacklist regions. A high set (though not all) should fall into the promoter regions. A high set (though not all) should fall into the enhancer regions. The promoter regions should not take up all reads, as it is known that there is a bias for promoters in open chromatin assays.


Library complexity quality metrics


Library complexity (filtered non-mito BAM)

rep1rep2rep3
Total Fragments1107731479260943459293451
Distinct Fragments751930128012572051614861
Positions with Two Read1605534892960685782986
NRF = Distinct/Total0.6788020.86520.870499
PBC1 = OneRead/Distinct0.6843790.8667930.871808
PBC2 = OneRead/TwoRead3.2051957.4711637.781147

Mitochondrial reads are filtered out by default. The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads in a dataset; it is the ratio between the number of positions in the genome that uniquely mapped reads map to and the total number of uniquely mappable reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with AT LEAST one read pair. PBC1 is the primary measure, and the PBC1 should be close to 1. Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking, 0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with EXACTLY two read pairs. The PBC2 should be significantly greater than 1.


Fragment: read for a single-ended dataset, pair of reads for a paired-ended dataset
NRF: non redundant fraction
PBC1: PCR Bottleneck coefficient 1
PBC2: PCR Bottleneck coefficient 2
PBC1 is the primary measure. Provisionally


Replication quality metrics


IDR (Irreproducible Discovery Rate) plots

rep1_vs_rep2
rep1_vs_rep2
rep1_vs_rep3
rep1_vs_rep3
rep2_vs_rep3
rep2_vs_rep3
rep1-pr1_vs_rep1-pr2
rep1-pr1_vs_rep1-pr2
rep2-pr1_vs_rep2-pr2
rep2-pr1_vs_rep2-pr2
rep3-pr1_vs_rep3-pr2
rep3-pr1_vs_rep3-pr2
pooled-pr1_vs_pooled-pr2
pooled-pr1_vs_pooled-pr2

Reproducibility QC and peak detection statistics

overlapidr
Nt11646218762
N113610120120
N212657019757
N310494915352
Np16870536864
N optimal16870536864
N conservative11646218762
Optimal Setpooled-pr1_vs_pooled-pr2pooled-pr1_vs_pooled-pr2
Conservative Setrep1_vs_rep3rep2_vs_rep3
Rescue Ratio1.44858408751352361.9648225135913016
Self Consistency Ratio1.29682988880313291.310578426263679
Reproducibility Testpasspass

Reproducibility QC


Number of raw peaks

rep1rep2rep3
Number of peaks299456299470299458

The number of peaks is capped at 300000
Peaks are called from macs2 with p-val threshold 0.01

Peak calling statistics


Peak region size

rep1rep2rep3idr_optoverlap_opt
Min size150.0150.0150.0160.0150.0
25 percentile178.0185.0171.0316.0231.0
50 percentile (median)211.0245.0221.0448.0324.0
75 percentile301.0369.0332.0630.0475.0
Max size2340.02457.02086.02452.03214.0
Mean261.484181315452305.6386950278826277.94504738561509.5519748263889387.6935360540589

rep1
rep1
rep2
rep2
rep3
rep3
idr_opt
idr_opt
overlap_opt
overlap_opt

Enrichment / Signal-to-noise ratio


Strand cross-correlation measures (filtered BAM)

rep1rep2rep3
Number of Subsampled Reads125000001250000012500000
Estimated Fragment Length000
Cross-correlation at Estimated Fragment Length0.173090353331310.1827307066147410.180868064800158
Phantom Peak406550
Cross-correlation at Phantom Peak0.1763990.16879660.1692393
Argmin of Cross-correlation150015001500
Minimum of Cross-correlation0.150660.15972170.1594195
NSC (Normalized Strand Cross-correlation coeff.)1.1488811.1440571.134542
RSC (Relative Strand Cross-correlation coeff.)0.87145262.5354622.18422


Performed on subsampled (25000000) reads. Such FASTQ trimming is for cross-corrleation analysis only.


rep1
rep1
rep2
rep2
rep3
rep3

TSS enrichment (filtered/deduped BAM)

rep1rep2rep3
TSS enrichment3.1221419233037376.33314744490494254.79470448688959

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays should show enrichment in open chromatin sites, such as TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is above 10. For other references please see https://www.encodeproject.org/atac-seq/


Jensen-Shannon distance (filtered/deduped BAM)

rep1rep2rep3
AUC0.257655935423047660.28388276619873030.27447123443696314
Synthetic AUC0.496283490552595240.49716324443975670.4962147028865357
X-intercept0.159029056711678470.143141948165433880.15556768668102405
Synthetic X-intercept0.00.00.0
Elbow Point0.5733176561865740.54443091357372960.5508937373398042
Synthetic Elbow Point0.50669503273073220.50419033381406330.5064545074302387
Synthetic JS Distance0.293864494904513940.26066685143027560.26577089471170084

Peak enrichment


Fraction of reads in peaks (FRiP)

FRiP for macs2 raw peaks

rep1rep2rep3rep1-pr1rep2-pr1rep3-pr1rep1-pr2rep2-pr2rep3-pr2pooledpooled-pr1pooled-pr2
Fraction of Reads in Peaks0.14499357253507770.135711825985555920.132948935860766250.149457959076473720.140440017882764920.14086351304630240.1482973214847990.139216564899690860.143219659979276850.134726652182274110.13222710259988670.13088013991648104

FRiP for overlap peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.0613385011763926260.066449629555136470.061546952723658030.081757262956711430.074849307553309650.062621056626269280.08964799207682374

FRiP for IDR peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.0065536511881596530.0111763710825110510.018758145504636440.0204571731985269570.0223805930056422860.018142951101874510.030807184719715358

For macs2 raw peaks:


For overlap/IDR peaks: